Rational design of novel coumarins: A potential trend for antioxidants in cosmetics

Coumarins are well-known for their antioxidant effect and aromatic property, thus, they are one of ingredients commonly added in cosmetics and personal care products. Quantitative structure-activity relationships (QSAR) modeling is an in silico method widely used to facilitate rational design and structural optimization of novel drugs. Herein, QSAR modeling was used to elucidate key properties governing antioxidant activity of a series of the reported coumarin-based antioxidant agents (1-28). Several types of descriptors (calculated from 4 softwares i.e., Gaussian 09, Dragon, PaDEL and Mold2 softwares) were used to generate three multiple linear regression (MLR) models with preferable predictive performance (Q2LOO-CV = 0.813-0.908; RMSELOO-CV = 0.150-0.210; Q2Ext = 0.875-0.952; RMSEExt = 0.104-0.166). QSAR analysis indicated that number of secondary amines (nArNHR), polarizability (G2p), electronegativity (D467, D580, SpMin2_Bhe, and MATS8e), van der Waals volume (D491 and D461), and H-bond potential (SHBint4) are important properties governing antioxidant activity. The constructed models were also applied to guide in silico rational design of an additional set of 69 structurally modified coumarins with improved antioxidant activity. Finally, a set of 9 promising newly design compounds were highlighted for further development. Structure-activity analysis also revealed key features required for potent activity which would be useful for guiding the future rational design. In overview, our findings demonstrated that QSAR modeling could possibly be a facilitating tool to enhance successful development of bioactive compounds for health and cosmetic applications.


INTRODUCTION
Free radicals (or oxidants) are highly reactive molecules containing an unpaired electron, which are generated as by-products of physiological processes and intracellular pathways (Valko et al., 2007;Winyard et al., 2005). These oxidants are well-known for their harmful potential and deleterious effects in cellular components (i.e., DNA, proteins and lipids). In normal condition, these radicals are scavenged/neutralized by antioxidant defense mechanism (i.e., endogenous antioxidant molecules and antioxidant enzymes) to prevent cellular oxidative damages. However, the shift of oxidative balance occurs in a condition whereby radicals are overproduced or antioxidant defense mechanism is depleted. This situation leads to excessive accumulation of free radicals and oxidative stress. Oxidative damage involves in pathogenesis and progression of many chronic and aging diseases (i.e., cancer, diabetes mellitus, neurodegenerative diseases, and cardiovascular diseases) (Valko et al., 2007;Winyard et al., 2005). Furthermore, free radicals have been recognized as one of the factors contributing to aging skin (Bogdan Allemann and Baumann, 2008). Antioxidant compounds have been well-recognized for their wide-ranging health applications, especially in cosmeceutical area. Currently, an addition of antioxidants as active ingredients in cosmetics and personal care products has been widely documented (Kusumawati and Indrayanto, 2013;Lupo, 2001). Therefore, discovery of novel potent antioxidant compounds, both from chemical synthesis (Prachayasittikul et al., 2009a;Subramanyam et al., 2017;Worachartcheewan et al., 2012) and naturalderived sources (Elansary et al., 2018;Krishnaiah et al., 2011;Prachayasittikul et al., 2008Prachayasittikul et al., , 2009bPrachayasittikul et al., , 2013Wongsawatkul et al., 2008), has been noted to be an attractive research area, especially in cosmetic applications (Kusumawati and Indrayanto, 2013;Lupo, 2001).
Coumarins, known as benzopyrones, are natural secondary metabolites bearing fused benzene and α-pyrone rings (Witaicenis et al., 2014). Natural-derived coumarins are found in a wide range of plants (Lee et al., 2007;Rodríguez-Hernández et al., 2019;Saleem et al., 2019;Venditti et al., 2019). Coumarins displayed a variety of biological activities including antimicrobial (Arshad et al., 2011), antioxidant (Erzincan et al., 2015), anticancer (Nasr et al., 2014), and anti-inflammatory (Witaicenis et al., 2014) activities. Although synthetic coumarins were banned for oral products due to their potential toxicities, they are attractive for topical uses due to their high skin penetrating property (Stiefel et al., 2017). Additionally, coumarins are widely used as fragrance ingredient in cosmetics and personal care products because of their sweet herbaceous scent (Ma et al., 2015;Stiefel et al., 2017). Antioxidant property and protective effects against skin photo-aging of coumarins have also been remarked in cosmetic area (Kostova et al, 2011;Lee et al., 2007). Previously, a set of synthesized coumarin derivatives containing 2-methylbenzothiazolines, sulphonamides, and amides were reported to exhibit antioxidant activity with IC50 values range of 0.024-2.888 mM (Khoobi et al., 2011;Saeedi et al., 2014). However, deeper understanding of structure-activity relationships (SAR) and mechanism of action is still necessary for an effective rational design of coumarin-based antioxidant agents (Kostova et al., 2011).
Computational approaches have been widely recognized to facilitate and increase success rate of drug development (Nantasenamat and Prachayasittikul, 2015;Prachayasittikul et al., 2015a). Quantitative structure-activity relationship (QSAR) modeling is an in silico method to reveal the relationship between chemical structures of the compounds and their biological activities. QSAR modeling provides useful findings such as key features, properties, or moieties that are required for potent activity, which would benefit further rational design of the related compounds. Currently, success stories of QSARdriven rational design of several classes of promising lead compounds have been documented for anticancer agents (Prachayasittikul et al., 2015b), aromatase inhibitors (Prachayasittikul et al., 2017), and sirtuin-1 activators (Pratiwi et al., 2019). In cosmetic area, QSAR modeling has been employed to improve understanding towards SAR of tyrosinase inhibitors (Gao, 2018;Khan, 2012).
Accordingly, this study aims to construct QSAR models to elucidate SAR of a set of antioxidant coumarin derivatives (1-28, Figure  1) originally reported by Khoobi et al. (2011) and Saeedi et al. (2014). Herein, QSAR mod- els were constructed using multiple linear regression (MLR) algorithm to clearly demon-strate the linear relationship along with insight SAR analysis. In an attempt to find a robust and validating QSAR models, chemical descriptors were generated using different four softwares (i.e., Gaussian 09, Dragon, PaDEL and Mold 2 softwares) to increase a variety of represented physicochemical properties. Consequently, an additional set of structurally modified compounds were rationally designed based on key findings of the constructed models, and their antioxidant activities were predicted to reveal the promising ones with potential for further synthesis and development.

Data set
A data set of twenty-eight coumarinbased antioxidants (1-28, Figure 1) was retrieved from the literature (Khoobi et al., 2011;Saeedi et al., 2014), in which their antioxidant activities are presented in Table 1. All tested compounds were evaluated by 1,1-diphenyl-2-picryhydrazyl (DPPH) assay (detailed methodology is provided in original literatures (Khoobi et al., 2011;Saeedi et al., 2014)). The activity was denoted as an IC50 value (mM) which indicates concentration of the compound which can inhibit 50 % of the generated DPPH radicals in experimental setting. As a part of data pre-processing, the unit of IC50 values was converted from mM to M, and the IC50 values were further transformed into pIC50 (−log IC50) by taking the negative logarithm to base 10 as shown in Table 1. The compound with high pIC50 (low IC50) represented the high antioxidant activity. A schematic workflow of QSAR model development is provided in Figure 2.

Molecular structure optimization
Molecular structures of the coumarin derivatives were constructed by GaussView (Dennington et al., 2003), which were subjected to geometrical optimization by Gaussian 09 (Revision A.02) (Frisch et al., 2009) at the semi-empirical level using Austin Model 1 (AM1) followed by density functional theory (DFT) calculation using Becke's threeparameter hybrid method and the Lee-Yang-Parr correlation functional (B3LYP) together with the 6-31 g(d) basis.

Descriptor calculation and feature selection
The physicochemical properties (i.e., quantum chemical and molecular descriptors) were generated by different calculating softwares including Gaussian 09, Dragon, version 5.5. (Talete, 2007), PaDEL, version 2.20 (Yap, 2011) and Mold 2 , version 2.0 (Hong et al., 2008) softwares. The calculated descriptors as numerical values could be used to represent properties of the compounds, and were further used as predictors (X variables) for QSAR model construction. List of calculated descriptors are shown as follows.
An additional set of molecular descriptors was calculated by PaDEL software to give 1,444 1D and 2D descriptors, and Mold 2 software to generate 777 descriptors by encoding the 2D chemical structure information. Before the calculation, the molecular structures were saved to *.smi and then converted to *.mol files using OpenBabel version 2.3.2 (The Open Babel Package 2015). The *.mol files were used as the input data for calculation by PaDEL and Mold 2 softwares.
Descriptors selection was performed to filter a set of important informative de-scriptors from a whole set of descriptors. Feature selection was initially performed by stepwise multiple linear regression (MLR) using SPSS statistics 18.0 (SPSS Inc., USA) followed by determination of intercorrelation using Pearson's correlation coefficient using cutoff value of |r| ≥ 0.9. Any pairs of descriptors with |r| ≥ 0.9 were defined as highly correlated predictors, and one of them was excluded.

Data splitting
The data set of coumarin derivatives (1-28) was randomly selected, in which 85 % (23 compounds) of the original data set was used as the training and the leave one-out cross-validation (LOO-CV) sets, and 15 % (5 compounds) was used as the external set. The training set was employed to generate the QSAR models, whereas LOO-CV and external sets were used to evaluate the models. LOO-CV method was performed for internal validation by excluding one sample out from the whole data set to be used as the testing set while the remaining N−1 samples were used as the training set (Prachayasittikul et al., 2014). This sampling process was repeated iteratively until every sample in the data set was used as the testing set. The external sets were used to validate the models.

Multivariate analysis
QSAR models were generated using the MLR according to the equation 1.
where Y is the antioxidant activity (pIC50), B0 is the intercept, and n B are the regression coefficients of the descriptors n X . The MLR method was performed using Waikato Environment for Knowledge Analysis (Weka), version 3.4.5 (Witten et al., 2011).

Molecular descriptors selection
Chemical structures of the compounds and their antioxidant activities (Table 1) were used for construction of predictive models. The compounds were geometrically optimized with semi-empirical method AM1 followed by DFT/B3LYP/6-31 g(d) basis using Gaussian 09 to obtain lower-energy conformers. The optimized compounds were extracted to obtain 13 quantum chemical descriptors. These compounds were subsequently used as input files for calculating an additional set of 3,224 molecular descriptors (0D-3D) using Dragon software. The calculated descriptors with constant values and multi-collinearity were determined and removed to give a final set of 1,489 descriptors. In addition, original molecular structures of compounds were saved as *.smi file format and were converted into *.mol files using OpenBabel version 2.3.2. These *.mol files then were used as input files for descriptors calculation using Mold 2 and PaDEL softwares to obtain sets of 777 Mold 2 2D descriptors and 1,444 PaDEL 0D-2D descriptors, respectively. Consequently, feature selection was performed to select a set of informative descriptors for the whole calculated set.  Tables 2  and 3, respectively. Furthermore, the intercor- Smallest absolute eigenvalue of Burden modified matrixn 2 / weighted by relative Sanderson electronegativities 2D (Burden modified eigenvalues) MATS8e Moran autocorrelation -lag 8 / weighted by Sanderson electronegativities 2D (Autocorrelation)

SssCH2
Sum of atom-type E-State: -CH2-2D (Atom type electrotopological state) relation matrix between pair of molecular descriptors was performed using Pearson's correlation coefficient (r) (Supplementary Tables 1-3). Cutoff value of |r| ≥ 0.9 was used to determine the intercorrelation. The results showed that there was no intercorrelation within a set of selected descriptors as displayed by low |r| values  0.9, which suggested that each descriptor was independent from other descriptors. Finally, a set of 14 selected descriptors was further employed to construct 3 QSAR models (according to types of software used to calculate descriptor values) for predicting antioxidant activity of the coumarin derivatives.

QSAR models
Descriptors obtained from these softwares have been demonstrated for their successful QSAR modeling such as antioxidant (Alisi et al., 2018;Rastija et al., 2018), antimicrobial (Alyar et al., 2009;Basic et al., 2014;Podunavac-Kuzmanović et al., 2009), anticancer (Sławiński et al., 2017;Suvannang et al., 2018) and antiviral Saavedra et al., 2018;Worachartcheewan et al., 2019) activities. Herein, three models were separately constructed based on the types of key descriptors (i.e., model 1 Dragon descriptors, model 2 Mold 2 descriptors, and model 3 PaDEL descriptors). A set of 14 selected informative descriptors (as independent variables, Table 2) and antioxidant activities (pIC50 values as dependent variables) of the studied compounds were included in the data sets for construction of QSAR models using Eq. (1). Before building the models, the data set of coumarin derivatives (1-28) was split into training, LOO-CV, and external sets. The training set was used to construct the model using MLR algorithm whereas both LOO-CV and external sets were utilized for validating the constructed models. Compounds 1, 6, 15, 21 and 27 were randomly selected to be used as external sets, while the remaining 23 compounds in the data sets (i.e., 2-5, 7-14, 16-20, 22-26 and 28) were employed as training set. As a result, three QSAR models (models 1-3) were successfully constructed for predicting antioxidant activities (pIC50 values) of the studied coumarin analogs. where NTr, NLOO-CV and NExt are the number of compounds of training, LOO-CV and external sets. R 2 Adj is the adjusted R 2 .
Four molecular descriptors calculated from Dragon software were used as predictors to construct QSAR model 1 as shown in Eq.
(2). Statistical parameters indicating predictive performance of the model are summarized in Table 4 In overview, three constructed models provided satisfactory results as indicated by their statistical parameters such as R 2 , Q 2 , RMSE, F ratio and PRESS values. The R 2 and Q 2 of the obtained QSAR models were considered as acceptable values when R 2 >0.6 and Q 2 >0.5 (Golbraikh and Tropsha, 2002;Nantasenamat et al., 2010). These parameters of all constructed models were in acceptable range ( (Frimayanti et al., 2011;Rastija et al., 2018). The statistical (Table 4) and graphical ( Figure  3) results showed that the QSAR models (models 1-3) gave a reliable agreement of the experimental and the predicted antioxidant values. Furthermore, the plots of experimental activity and residual values (Figures 3b, 3d and 3f) displayed the distribution of residuals on both sides of the zero values indicating that there are no systemic error in the models (Jalali-Heravi and Kyani, 2004). Therefore, the QSAR models 1-3 could be possibly used and reliable for predicting the antioxidant activity of coumarin derivatives. Considering the correlation coefficient (Q 2 ) of external set, it was shown that the Dragon descriptors gave the highest quality of the prediction for external test set (model 1: Q 2 Ext = 0.952) followed by the PaDEL descriptors (model 3: Q 2 Ext = 0.885) and the Mold 2 descriptors (model 2: Q 2 Ext = 0.875).

Structure-activity relationship (SAR)
Regression coefficient values of the key descriptors (as independent variables) in QSAR models define the degree or weight of their influence on dependent variables ( To gain insights into SAR, coumarin derivatives (1-28, Figure 1) are categorized into 3 groups according to their core structures (i.e., thiazole group I (1-9 and 18), sulfonamides group II (10-17) and amides group III (19-28) for effective SAR analysis. Thiazoles group I (1-9 and 18) showed antioxidant activity (Table 1) with pIC50 range of 2.539-4.612. The most potent and the least potent compounds of benzothiazoles group I were 5 (pIC50 = 4.612) and 6 (pIC50 = 2.539), respectively. Among group II compounds (10-17), compound 11 was the most active (pIC50 = 3.180), and 14 was the least active compound. For group III of amides 19-28, compound 21 displayed the most potent activity (pIC50 = 3.027) and compound 19 exhibited the lowest activity with pIC50 of 2.640.
According to the significant descriptors in models 1-3, secondary (sec-) amine, polarizability, electronegativity and H-bond displayed positive effect in the antioxidant activity. This is noted in the most potent coumarin 5 bearing sec-amine (part of aromatic thiazole), and 7-OH group (on the coumarin ring) with Hbond and polarizability properties. On the other hand, tertiary (tert-) amine 6 without 7-OH group exerted the lowest activity among the coumarin derivatives 1-28. This could be implied that the sec-amine (-NH-) and OH as H-bond and polarizing group are important for the better activity.
It should be noted that the most potent modified compounds had higher values of Hbonding descriptor (SHBint4 = 2.048-16.903, Supplementary Table 7) when compared with their parent compounds (SHBint4 = 0.000-7.5875, Table 3). Thus, SHBint4 might be the important descriptor in governing the potent antioxidant activity.

CONCLUSION
Understanding SAR is important for improving bioactivities and pharmacokinetic properties in development of potent and safe cosmetic products. Herein, a set of coumarin derivatives (1-28) with antioxidant activity was used to construct three QSAR models (1-3) using three different descriptor types and MLR method. Results of statistical evaluation showed that three generated QSAR models provide good reliability and comparable predictive performance (Q 2 LOO-CV = 0.813-0.908; RMSELOO-CV = 0.150-0.210; Q 2 Ext = 0.875-0.952; RMSE Ext = 0.104-0.166). In addition, good correlation obtained from model prediction suggests that the selected significant descriptors were shown to be good representatives for revealing correlation between chemical structures of the compounds (i.e., nArNHR, H-bonding, polarizability, van der Waals volume and electronegativity properties) and their antioxidant activities. An application of the constructed models was demonstrated by rationally designed an additional set of 69 structurally modified coumarins based on key descriptors, in which their antioxidant activities were predicted using the obtained QSAR models (1-3). Most of the rationally designed compounds displayed more improved antioxidant activity when compared with their parents. Particularly, the top three newly designed compounds (5h, 4g and 3n) showing high H-bonding (SHBint4) descriptor values which may play part in governing the most improved antioxidant activity. Finally, a set of newly designed promising coumarin analogs were highlighted for their potential to be further developed as potent antioxidants. Insights SAR findings also provided beneficial guidelines for the rational design of novel coumarin-based compounds with potent antioxidant effect for cosmetic applications.

Supplementary information
Supplementary information is available on the EXCLI Journal website.